[move-compiler] Added dot chain parsing resilience #17106

awelc · 2024-04-10T02:13:51Z

Description

This PR adds parsing resilience to dot chains (e.g., some_struct.some_field). It's a pre-requisite to adding auto-completion for struct fields and struct functions (aka methods) to the IDE.

Implementation-wise, I was also trying making parse_identifier parser-resilient (have it always return a value instead of Result). This does not quite work, though, since in parse_dot_or_index_chain we need to know that if an identifier fails to parse, we should stop parsing the chain itself. If the parse_identifier never returns an error, we would simply continue parsing the chain and generate weird errors.

Test plan

An additional compiler test has been added. Also a symbolicator test has been added to verify that prefixes of partially parsed dot chains are parsed correctly.

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

Protocol:
Nodes (Validators and Full nodes):
Indexer:
JSON-RPC:
GraphQL:
CLI: Dot chain compiler diagnostics are more prevalent (for example, some_struct.some_field) as parsing errors no longer prevent compilation. Consequently, the compiler can reach later compilation stages where it might generate additional diagnostics.
Rust SDK:

vercel · 2024-04-10T02:13:58Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-core	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 20, 2024 3:12am

3 Ignored Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Apr 20, 2024 3:12am
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Apr 20, 2024 3:12am
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Apr 20, 2024 3:12am

tnowacki

Seems rather subtle and I don't quite follow all of it, so just some initial questions :)

tnowacki · 2024-04-10T17:45:05Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

@@ -117,12 +117,6 @@ fn unexpected_token_error_(
    ))
 }

-fn add_type_args_ambiguity_label(loc: Loc, mut diag: Box<Diagnostic>) -> Box<Diagnostic> {


Why did this go away?

It was no longer used after the "main" parser resilience PR (#16673) landed. This was due to the fact that generating this additional label was predicated on parse_comma_list returning an error which it no longer does. It just got more explicit in the current PR as parse_optional_type_args (wrapping parse_comma_list) now also no longer returns an error.

tnowacki · 2024-04-10T17:46:06Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                                context.add_diag(*diag);
+                                let end_loc = context.tokens.previous_end_loc();
+                                let loc = make_loc(context.tokens.file_hash(), start_loc, end_loc);
+                                sp(loc, Symbol::from(""))


What's this symbol here?

What we are trying to do here is to parse a piece of code like this:

fun foo(s: SomeStruct) { s. }

This is happening in parse_dot_or_index_chain which contains a loop parsing a potentially multi-segment dot chain (e.g., s.foo().f).

Considering the example above, we would first (before the loop even starts) parse s, then encounter a dot and try to parse whatever comes after the dot. In our example, this would fail as due to } being encountered. Still, we want the partial access chain to be available for auto-completion so that we can determine the type of s, so we "complete" the chain with a fake (empty) identifier, and keep going with the parsing.

But we will end up with a really strange error message Unbound field '' in 'a::m::AnotherStruct'

Which feels like something you maybe want to suppress? Could we add a new node or some sort of error node that the IDE can latch onto?

tnowacki · 2024-04-10T17:46:26Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                        let n = match parse_identifier(context) {
+                            Ok(id) => id,
+                            Err(diag) => {
+                                done_parsing = true;


Why not just break?

If you consider an example I tried to put together in the other response, if we break here (instead of completing construction of Exp_::Dot) we will not accomplish our goal as the AST node for the s. statement will not be created.

Should we just return then?

tnowacki · 2024-04-10T20:18:11Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                        let n = match parse_identifier(context) {
+                            Ok(id) => id,
+                            Err(diag) => {
+                                done_parsing = true;


Should we just return then?

tnowacki · 2024-04-10T20:18:27Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

    loop {
+        if done_parsing {
+            break;
+        }


This is a while loop :)

tnowacki · 2024-04-10T20:20:19Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                                context.add_diag(*diag);
+                                let end_loc = context.tokens.previous_end_loc();
+                                let loc = make_loc(context.tokens.file_hash(), start_loc, end_loc);
+                                sp(loc, Symbol::from(""))


But we will end up with a really strange error message Unbound field '' in 'a::m::AnotherStruct'

Which feels like something you maybe want to suppress? Could we add a new node or some sort of error node that the IDE can latch onto?

tnowacki

Just one comment, and I think we would benefit on getting some thoughts from @cgswords.
Otherwise looks good!

tnowacki · 2024-04-11T22:19:05Z

external-crates/move/crates/move-compiler/src/typing/translate.rs

+            ExpDottedAccess::UnresolvedError => {
+                assert!(idx == num_accessors - 1);
+                break;


This feels like it will not be enough for the IDE. I would imagine you will need something like UnannotatedExp_::InvalidAccess(Box<Exp>, Option<Name>) which will let you float access to invalid fields and invalid methods

I am not sure I understand the limitation here and was hoping you could you give an example when this is not going to be sufficient.

I was going for the IDE to be able to understand what all valid prefixes of an otherwise only partially parsed chain are. As per added test it should understand what s (in s.;) and s.another_field (in s.another_field.;) are, even if parsing fails after the final dot (as for auto-completion we don't really care what "wrong thing" comes after the final dot)

fun foo(s: AnotherStruct) { let _tmp1 = s.; // incomplete with `;` (next line should parse) let _tmp2 = s.another_field.; // incomplete with `;` (next line should parse) }

If you have s.foo_ in the IDE, you want all fields and functions that are prefixed by foo_

For me personally, I never do s. and wait, I always think I know the name of the function or field, and then am sadly mistaken. But autocorrect picks up about halfway through me typing.

Got it. From what I understand, though, integration with the IDE will work in a slightly different way. There is only one trigger event for auto-completion, and it's the appearance of the . character. Whatever you type afterwards does not trigger further auto-completion events. This means, that the LSP implementation needs to provide all possible completions upon just seeing . (as otherwise it will not have another chance). It may appear as if it's the LSP that does the filtering (in your example, show only functions prefixed by foo_) but it's the "client" side that actually does it.

I can include IDE integration in this PR and make sure that everything works as expected but per @cgswords's previous request I am trying to keep the compiler PRs and IDE PRs separate.

OK, I take it back :-) I thought it over and now I am understanding much better what you had in mind! My argument still partially stands, but there are cases when we'd need the "invalid" names. More changes (likely) coming!

I reworked things a bit. We indeed need new UnannotatedExp_::InvalidAccess to support a dot-chain with no proper identifier after the dot (e.g., tmp.;), even if the dot is on a separate line (that's why we need to preserve dot's location in conjunction with the type of expression that precedes it, so that the symbolicator can get the preceding expression type having only location of the dot at its disposal).

I don't think we need anything new to represent a dot chain with an "invalid" field or method name name (e.g., tmp.invalid where invalid does not represent a field or a method name). This will still parse as a Exp_::Dot (rather than Exp_::DotUnresolved) but with UnresolvedType instead of some real one, and we just have to make sure that we process these correctly in the symbolicator.

tnowacki · 2024-04-11T22:20:04Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                    _ => match parse_identifier(context) {
+                        Err(diag) => {
+                            context.add_diag(*diag);
+                            Exp_::DotUnresolved(Box::new(lhs))


Looks like this turned out relatively cleanly!

cgswords · 2024-04-16T16:44:25Z

external-crates/move/crates/move-compiler/src/expansion/ast.rs

@@ -1734,6 +1735,10 @@ impl AstDebug for ExpDotted_ {
                w.comma(&rhs.value, |w, e| e.ast_debug(w));
                w.write("]");
            }
+            D::DotUnresolved(_, e) => {
+                e.ast_debug(w);
+                w.write("")


Nit: have this print something more clear

It was supposed to be (fixed now):

e.ast_debug(w); w.write("")

external-crates/move/crates/move-compiler/src/parser/syntax.rs

cgswords · 2024-04-16T17:01:17Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                            Exp_::DotUnresolved(loc, Box::new(lhs))
+                        }
+                        Ok(n) => {
+                            if is_start_of_call_after_function_name(context, &n) {


Do we want to check this either way so that we support s.foo.(x, y) ? I can imagine trying to get IDE help by hvaing s.foo.bar(x, y) and removing bar to try to get autocomplete.

I think this will work without any changes as s.foo.(x, y) will parse to the DotUnresolved case and auto-completion will work correctly.

Can you add a test?

Added this to symbolicator tests, but I am also planning to have a more comprehensive auto-completion tests once auto-completion features start landing (i.e., have unit tests that actually check what would auto-completion actually insert in these partially-parsed cases).

cgswords · 2024-04-16T17:08:19Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

+                                let is_macro = if let Tok::Exclaim = context.tokens.peek() {
+                                    let loc = current_token_loc(context.tokens);
+                                    context.advance();
+                                    Some(loc)
+                                } else {
+                                    None
+                                };
+                                let mut tys = None;
+                                if context.tokens.peek() == Tok::Less
+                                    && n.loc.end() as usize == call_start
+                                {
+                                    tys = parse_optional_type_args(context);
+                                }


Is there a reason not to just call parse_macro_opt_and_tyargs_opt here?

Not that I can see. Previously, I just used what was already there, but now I rewrote it to use parse_macro_opt_and_tyargs_opt.

cgswords

A few small follow-ups.

One broader question is: why do we maintaing DotResolved so deep into the compiler? We don't appear to be using it anywhere yet. Can you say what the plan is?

awelc · 2024-04-17T00:24:49Z

A few small follow-ups.

One broader question is: why do we maintaing DotResolved so deep into the compiler? We don't appear to be using it anywhere yet. Can you say what the plan is?

For an incomplete dot chain that does not have a parsable identifier after dot (e.g., tmp.;), we need to carry the location of the dot and the type of the dot's prefix to the typing AST so that the symbolicator can retrieve the right type of information for dot auto-completion. I have been implementing this in the symbolicator on the side to see if we have all wee need and it seems to be working as expected.

cgswords · 2024-04-19T17:40:19Z

external-crates/move/crates/move-compiler/tests/move_2024/parser/dot_incomplete.move

+        let _tmp2 = s.another_field.;  // incomplete with `;` (next line should parse)
+        let _tmp3 = s.another_field.   // incomplete without `;` (unexpected `let`)
+        let _tmp4 = s;
+        let _tmp = s.                  // incomplete without `;` (unexpected `}`)


This error is a bit weird because it just says it's expected an identifier, when a number is also allowed. Maybe a last small thing to fix?

Good point. This was the original behavior, though, and I simply kept it. That being said, it definitely makes sense to be more precise here so I changed it to generate a diagnostic according to your suggestion.

cgswords

LGTM. One last little request.

awelc requested review from tnowacki and cgswords April 10, 2024 02:13

awelc requested review from cgswords and tzakian and removed request for cgswords April 10, 2024 02:14

awelc self-assigned this Apr 10, 2024

awelc requested a review from dariorussi April 10, 2024 02:14

tnowacki reviewed Apr 10, 2024

View reviewed changes

vercel bot deployed to Preview – sui-core April 11, 2024 17:52 View deployment

awelc marked this pull request as ready for review April 11, 2024 17:55

vercel bot deployed to Preview – sui-core April 11, 2024 19:34 View deployment

awelc requested a review from tnowacki April 11, 2024 20:03

tnowacki reviewed Apr 11, 2024

View reviewed changes

awelc requested a review from tnowacki April 12, 2024 17:37

vercel bot deployed to Preview – sui-core April 15, 2024 21:55 View deployment

awelc force-pushed the aw/compiler-dot-resilience branch from c0fab10 to 992569e Compare April 15, 2024 21:57

vercel bot deployed to Preview – sui-core April 15, 2024 21:58 View deployment

cgswords reviewed Apr 16, 2024

View reviewed changes

external-crates/move/crates/move-compiler/src/parser/syntax.rs Show resolved Hide resolved

cgswords reviewed Apr 16, 2024

View reviewed changes

cgswords requested changes Apr 16, 2024

View reviewed changes

vercel bot deployed to Preview – sui-core April 17, 2024 00:26 View deployment

vercel bot deployed to Preview – sui-core April 17, 2024 00:43 View deployment

awelc requested a review from cgswords April 17, 2024 01:03

cgswords reviewed Apr 19, 2024

View reviewed changes

cgswords approved these changes Apr 19, 2024

View reviewed changes

vercel bot deployed to Preview – sui-core April 19, 2024 22:09 View deployment

awelc force-pushed the aw/compiler-dot-resilience branch from 7ee7c75 to 4d29768 Compare April 19, 2024 22:16

vercel bot deployed to Preview – sui-core April 19, 2024 22:24 View deployment

awelc added 8 commits April 19, 2024 17:11

[move-compiler] Added dot chain parsing resilience

b50f15b

Introduced partially parsed dot chain AST node

4fe5f54

Removed forgotten debug print

0320c49

A minor test fix

d683704

Added additional info to the error case

980ce0e

Addressed review comments

8b292ba

Fixed analyzer tests

0665e4f

Addressed review comments

61466e4

awelc force-pushed the aw/compiler-dot-resilience branch from 4d29768 to 61466e4 Compare April 20, 2024 03:11

vercel bot deployed to Preview – sui-core April 20, 2024 03:12 View deployment

awelc enabled auto-merge (squash) April 20, 2024 03:25

awelc merged commit 9a202f1 into main Apr 20, 2024
47 checks passed

awelc deleted the aw/compiler-dot-resilience branch April 20, 2024 03:40

[move-compiler] Added dot chain parsing resilience #17106

[move-compiler] Added dot chain parsing resilience #17106

Conversation

awelc commented Apr 10, 2024 • edited by ronny-mysten Loading

Description

Test plan

Release notes

vercel bot commented Apr 10, 2024 • edited Loading

tnowacki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnowacki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awelc Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgswords left a comment

Choose a reason for hiding this comment

awelc commented Apr 17, 2024

cgswords Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgswords left a comment

Choose a reason for hiding this comment

awelc commented Apr 10, 2024 •

edited by ronny-mysten

Loading

vercel bot commented Apr 10, 2024 •

edited

Loading

awelc Apr 12, 2024 •

edited

Loading

cgswords Apr 19, 2024 •

edited

Loading